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Abstract: We consider nonparametric regression in the context of functional data, 
that is, when a random sample of functions is observed on a fine grid. We obtain 

a functional asymptotic normality result allowing to build simxiltancous confidence 



o 

bands (SCB) for various estimation and inference tasks. Two applications to a 

SCB procedure for the regression function and to a goodness-of-fit test for curvi- 

^t"! linear regression models are proposed. The first one has improved accuracy upon 

the other available methods while the second can detect local departures from a 
^ . parametric shape, as opposed to the usual goodness-of-fit tests which only track 

global departures. A numerical study of the SCB procedures and an illustration 
C/) with a speech data set are provided. 

Key words and phrases: Nonparametric regression, functional data, functional 
^ asymptotic normality, simultaneous confidence bands, goodness-of-fit test 
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1. Introduction 

In function estimation problems, simultaneous confidence bands (SCB) pro- 

00 

(—•^ vide a unified set of graphical and analytical tools to harness tasks such as data 

exploration, model specification or validation, assessment of variabiUty in esti- 
IL" mation, prediction, and inference. 

*^ In the usual setting where the target function is observed only once at a fi- 

nite number of points with independent measurement errors, the construction of 
SCB has been extensively studied. For instance, in the context of nonparametric 
regression on which we focus in this paper, Eubank and Speckman (1993) and 
Wang and Yang (2009) have used strong invariance principles to build SCB in 
fixed and random designs. Johansen and Johnstone (1990) and Sun and Loader 
(1994) have applied the celebrated "tube formulas", which turn the calculation 
of simultaneous coverage probabilities into the simpler geometric computation of 
tubes' volumes, to simultaneous prediction bands and significance tests for pro- 
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jection pursuit regression, and to bias-corrected confidence regions with linear 
multivariate estimators, respectively. Other SCB procedures rely on bootstrap- 
ping (e.g. Neumann and Polzehl, 1998) or on simultaneous confidence intervals 
(SCI) followed by interpolation arguments (e.g. Hall and Titterington, 1988). 
We also point to Baraud (2004) for the computation of SCI on an increasing 
number of (fixed) design points and to Deheuvels and Mason (2004) for SCB 
with asymptotic coverage level 100%. 

In the case of time series, the construction of confidence regions proves more 
difficult and has received less attention in the literature. Robinson (1997) derives 
a SCI procedure that works under short-range, long-range and negative depen- 
dence. Wu and Zhao (2007) build SCB for the trend and a test for structural 
breaks using a strong invariancc principle. Wang (2009) provides SCB based on 
constant or linear splines. The case of a random design is studied in Zhao and 
Wu (2008). 

We turn to the case of functional data, for which the statistical objects 
under study are viewed as functions rather than scalars/ vectors and are generally 
observed on a dense grid. We note that this setting has attracted considerable 
interest over the recent years due to the now routine collection of high frequency 
data allowed by technology. (Numerous examples of areas of applications can 
be found in the books of Ramsay and Silverman, 2005, and Ferraty and Vieu, 
2006.) In the functional data framework where both the numbers of sampled 
functions, say n, and of design points, say p, may vary, Degras (2008) showed 
that all linear smoothers have an asymptotic variance of order n~^. In the same 
paper SCI are derived and compared to Bonferroni- and Scheffe-type intervals. 
Degras (2009) builds SCB for the regression function by coupling a functional 
central limit theorem [CLT] with a limit result on the supremum of a Gaussian 
process. 

The present work provides a functional asymptotic normality result that 
serves as a building block for estimation and inference in nonparametric regres- 
sion with functional data. We present two applications of this result to the band 
estimation of the regression function and to a goodness-of-fit test for curvilinear 
regression models. To the best of our knowledge, these tasks have not yet been 
addressed in the functional data setup. The proposed band estimation procedure 
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corrects some shortcomings of Degras (2009) by fully accounting for the covari- 
ance structure of the data-generating process. The goodness-of-fit test relies on 
a SCB for the difference between the regression function and its projection onto 
the null space. It can detect local departures, as opposed to other tests based on 
residual sums of squares or L2 norms that only track global departures. 

The remainder of the paper is organized as follows. Section 2 presents the 
regression model and estimator under study. Section 3 establishes the main result 
of functional asymptotic normality. The normal SCB for the regression function 
and the goodness-of-fit test are constructed in Section 4 and studied numerically 
in Section 5 along with bootstrap SCB and a pseudo-likelihood ratio test. Sec- 
tion 6 illustrates the use of SCB methods with a speech data set. A discussion 
is provided in Section 7. The proofs are deferred to the Appendix. 

2. Model and local linear estimator 

Let {Yij,Xj), 1 < i < n, 1 < j < p, he repeated measurements on a random 
sample of n experimental units, where Yij stands for the observed (scalar) re- 
sponse on the zth unit at the fixed value Xj of a variable x. It is assumed that 
X varies in a compact subset of for some d > 1, say [0, 1]'' without loss of 
generality. Consider the regression model 

Yij = fi{xj ) + Zi {xj )+eij (2.1) 

where /x is an unknown smooth function, the Zi are independent copies of a 
random process Z = {Z{x) : x G [0, l]'^} with mean zero and covariance function 
R, and the e^- arc random errors having mean zero. A triangular array structure 
is assumed for the data as n varies (in particular p = p{n) and Xj = Xj{n,p)). 
The regression function /j, may be viewed as a population mean response while 
the Zi represent individual departures from /x. In this paper we restrict our 
attention to the case d G {1,2} for simplicity but our results extend to higher 
dimensions. The following assumptions are needed for our asymptotic study: 

(A.l) The function jj, has bounded (partial) derivatives on [0, l]'^ up to order 2. 

(A.2) With probability one, \Z{x) - Z{x')\ < M\\x - x'W^^ for all x,x' G [0,1]'^, 
where M is a random variable (r.v.) of finite variance, /3 > is a constant, 
and II • II is a norm on [0, l]*^. 
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(A. 3) The Xj form a regular grid generated by a product density f{ti, . . . ,td) = 
11^=1 fk{tk)i where the fk are continuous and positive densities on [0, 1]. It 
holds that {xj : I < j < p} = {{xj^^i, Xj^^a) ■ < 3k < PkA < ^ < d} 
where J^'"'" h{t)dt = In particular p = Ut=iPk. 

(A. 4) n = o{^mink=i^...^d{pt)) and n^/^^'^^ ^og{p) = o{p) as n,p ^ oo. 

(A. 5) The random vectors {en, . . . , Eip)^, i = 1, . . . , n are mutually independent, 
independent of the Zi, and have the same normal distribution A'p(0,V). 
The eigenvalues of the covariance matrix V are uniformly bounded in n, p. 

Note that assumptions (A.1)-(A.5) are tailored for the functional data framework 
wherein typically, the design points are balanced and taken on a regular grid, the 
observed random processes are smooth, and the design size p is large enough 
relative to the sample size n so as to accommodate (A. 4). 

We recall here the definition of the local linear estimator (e.g. Fan, 1992). 
Let us denote by (■, ■)d the cuclidcan scalar product in R"' and use arithmetic 
operations in a componentwise sense. Let K be a kernel function on M'^ which 
we take nonnegative, Lipschitz-continuous, with support [—1, 1]'' and such that 
K{0) > 0. Let h = {hi,. . . , ha) > be a vector of bandwidths. Write the data 
averages as Yj = Y17=i 1 ^ i ^ P- For a given location x G [0, 1]*^, the 
local linear estimator /x(x) is defined as Po, where {Po, l3i) is the solution of the 
minimization problem 

p 



mm 

3o,/3i)elR'^+ 



This estimator can be expressed as 



^ (Yj -Po-{Pi, {xj - x) Uy K (^) . (2.2) 

7 = 1 ^ ' 



^i{x) = Y,Wi{x)Yj, (2.3) 



where Wj{x) = ^ and 

2J-j=iWj{x) 



Wj{x) = ^ (^S2{x) - {xj - x)si(x)) K (^^^^^-j-^^ 

1 P whend = l, (2.4) 
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or 

Wj{x) 



Suix)s22{x) - 5^2(2;) + {S02ix)si2ix) - SQi{x)s22ix)) {xj - x)W 



1 / X,- - X 



phih2 V ^ 



+ (soi(x)si2(x) - S02{x)sii{x)) {xj - X^ 

sm{x) = k,l = 0,1,2 

(2.5) 



when d = 2, with the notations z = {z^^\ z^^^) and z^'^') = 1 for all z €zM? 



3. Functional asymptotic normality 

We start with a definition. Let || • ||oo be the supremum norm on the space of 
continuous functions C([0, l]'^). A sequence of random elements of C([0, 1]*^) 
is said to converge weakly to a limit X in C([0, l]'^) if E(0(X„)) — ;> E((/)(X)) as 
n — )• 00 for all bounded, uniformly continuous functional (/> on (C([0, 1]°'), || • ||oo) 
(e.g. Pollard, 1990, p. 44). Let Q{m,C) denote a real-valued Gaussian process 
indexed by [0, 1]°' with arbitrary mean and covariance functions m and C . We 
are now in position to state the main result. 



Theorem 1. Assume (A.1)-(A.5) in model (2.1) with d G {1,2}. Consider the 



local linear estimator ]2 defined in (2.3)-(2.5) with a bandwidth h = h{n,p) such 



that: — )• and {p/ ^og{p))Y\k=i^k 00 as n,p — >• 00. Then y/n{'jl — fi) 
converges weakly to Q{0,R) in C([0, l]'^). 

Remarks. 

1. Convergence rate. The fact that the normalizing rate y/n depends neither on 
p nor on h is consistent with the literature. It reflects the fact that (/i — fi) 
is essentially a smoothed version of Z = Y17=i ^'i-^ whose covariance 
structure (R/n) is essentially unaffected by smoothing or discretization. 

2. Regularity of Z. The conclusion of Theorem 1 holds under weaker conditions 
than the stochastic Holder continuity (A. 2), e.g. when Z is mean-square 
continuous and has bounded variations (Degras, 2009). However (A. 2) is 
needed for the SCB and test procedure of Section 2. 

3. Joint growth of n and p. The growth conditions (A. 4) enforce that (a power 
of) p is large enough relative to n. This ensures the existence of a bandwidth 
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h = h{n,p) small enough to make the squared bias of Jl negligible before 
its variance but large enough to smooth out the measurement errors £ij 
uniformly as n,p — >■ oo. (A. 4) can be weakened (allowing for larger n and 
smaller p) by assuming a higher order of differentiability for /j, in (A.l) and 
using higher order local polynomial estimators or bias reduction techniques. 

4. Measurement errors. The uniform bound on the covariance matrix V in 
(A. 5) accommodates various forms of dependence such as short-range de- 
pendence and ARMA or mixing processes. The normality assumption is not 
essential to the results, however the decrease rates in the tail probabilities 
of the Eij influence the size of h needed to smooth out these errors. For in- 
stance, if the normality assumption is dropped then the factor (p/ log(p)) in 
the condition (p/ log(p)) 11^=1 /i-fe — > oo in Theorem 1 becomes p^^^ whereas 
if the Eij are assumed to be uniformly bounded, the factor is equal to p. 

5. Bandwidth selection. For simplicity Theorem 1 is presented with a determin- 
istic h but it also holds when h depends on the data {Yij,Xj) in a way that 
Cia{n,p) < h{n,p) < C2a{n,p), where a{n,p) is a deterministic sequence 
satisfying the conditions of Theorem 1 and Ci, C2 > are constants. Hence, 
suitable plug-in or cross-validation methods (e.g. Hart and Wehrly, 1993) 
can be used to select h. Also note that in the present context of functional 
data, the asymptotic variance of is of order n^^ and is only affected 
by h through a second-order term in 0{hn~^) (sec ibid.). Therefore, the 
strategy adopted here to render the bias (of order h'^) negligible before the 
variance is compatible with the optimization in h of the asymptotic mean 
squared error of //(x): it does not slow down the convergence. 

6. Longitudinal data. In contradistinction to the functional data setup, asymp- 
totic normality results in C([0, 1]*^) cannot be obtained in the longitudinal 
data setup where typically, many random functions are observed each at 
a few time points. In this setup, nonparametric estimators mostly average 
data across the sample units, which are independent, and not within, where 
the random process structure plays. Therefore they converge pointwise to 
a Gaussian white noise process at the usual regression rates (Yao, 2007). 
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4. Applications 

4.1 Simultaneous confidence bands for fi 

We apply here Theorem [T] to the construction of SCB for the regression 
function fi. Let us denote respectively by o"^ and p the variance and correlation 
functions of Z. Without loss of generality, we assume that a"^ is positive over 
[0, l]'^ so that p is well-defined. It stems from Theorem [l] and Slutsky's theorem 
that for any uniformly consistent estimator ct^ of o"^, the standardized estimator 
-y/n(/2 — p) /d converges to ^(0, p) in C{[0, l^) as n — )• oo. For a given confidence 
level 1 — 7, we seek approximate SCB of the form 



p{x) — — ^ , p{x) + 



n \ n 



xG[0,l]^ (4.1) 



where P (II g(0,/>)||oo > c^) ~ 7- 

A convenient estimator of ci^ is the empirical variance function of the smooth 
curves /Ij = Y^j=\ ^j^ij^ ^ — 1, • • • , namely 



dHx) 



n 



1 " 

— Y^{p,{x)-p{x))'- (4.2) 



This estimator is unbiased for the finite sample variance nVar(/I(x)) and it con- 
verges uniformly to a'^{x) in probability. (The uniform convergence is obtained 
by exploiting (A.2)-(A.5) together with a uniform law of large numbers (e.g. 
Pollard, 1990, Th. 8.2) and a classical limit result on the largest eigenvalue of a 
Wishart matrice (Geman, 1980) to control uniformly the errors £ij.) 

Two difficulties arise in the computation of the threshold c^: first, the cor- 
relation function p must be estimated and second, even if p were known, there 
exists no formula for the distribution of the maximum of a general Gaussian pro- 
cess (see e.g. Adler, 1990, p. 5). For the first problem a suitable estimator of p is 
the empirical correlation function 

/?(x, x') = EtiMxm^)-nmn{^') ^4 3) 

(n — 1) cj(x) cr(x ) 

For the second problem we resort to numerical techniques to estimate c-y. (See 
Section 6 for a discussion of the limitations of theoretical approximations to the 
distributions of maxima of Gaussian processes.) This can be done by simulating, 
conditional on p, a large number of sample paths of G{0,p) in order to obtain 
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the law £(11^(0, p)||oo|/5) and then by setting as the associated (1 — 7)100% 
quantile: 

P(||a(0,p)|U>c^|p)=7- (4.4) 

The fact that satisfies approximately P(||^(0, p) ||oo > c-^) = 7 is justified by 
the convergence of Q{Q,p) to Q{G,p) in C([0, 1]''), conditionally on p. This in 
turn stems from: (i) the finite-dimensional convergence of ^(0, p) thanks to the 
uniform convergence of p, and (ii) the asymptotic tightness of Q{Q,p) obtained 
through entropy calculations very similar to those in the Appendix. 
Gathering the previous elements, we obtain the following result. 

Theorem 2. Under the assumptions of Theorem^ the simultaneous confidence 



hands (4.1) have asymptotic coverage level 1 — 7 for p,: 



lim P (p{x) - c^ ^ < p(x) < p{x) + X G [0, 1]*^ ) = 1 - 7 



with the estimators a, p, and the threshold c^ defined in (4.2) -(4.3) -(4.4). 



In the case where the sample size n is small and the process Z cannot be 
assumed to have an approximate normal distribution, it may not be reasonable 
to rely on a functional CLT to build SCB for p. We thus propose, without 
theoretical justification, the following naive bootstrap procedure: 

1. Resample with replacement from the pi,i = 1, . . . , n to produce a bootstrap 
sample pi, ... , /u* . 

2. Compute the empirical mean and variance functions of the p*, say p* and 
(fj*)^, and compute z* = ^/n\\{p* — p)/a*\\oo. 



3. Repeat steps 1 and 2 many times to approximate the conditional law C* 



C{z*\Yi/s) and take the (1 - 7)100% quantile of C* for in ( |4l| ). 



4.2 A goodness-of-fit test for parametric models 

We now apply the ideas underlying Theorems T][2 to a goodness-of-fit test 



for curvilinear regression models. Indeed, with the knowledge of the limit distri- 
bution of an estimator in (C([0, l]'^), || • ||oo), it becomes possible to detect and 
test local departures from a given candidate model for p. This feature should 
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be contrasted with tests based on euclidean norms which only track global de- 
partures. See e.g. Azzalini and Bowman (1993), Hardle and Mammen (1993), 
and Stute (1997) where parametric and nonparametric estimates are compared 
either via their residual sum of squares or directly through L2 distances. 
Consider a candidate parametric model for ^ of the form 



^M = \^^em:{ei,...,eL)^Q^ (4.5) 



where L < 1 is a fixed integer, B C is a parameter space, and {(pi, . . . ,ipl) is 
a family of functions on [0, l]'^ satisfying 

(B.l) The are orthogonal w.r.t. the inner product ((71, 5(2)/ = / gi{x)g2{x)f{x)dx. 

(B.2) The ipi have bounded (partial) derivatives on [0, l]'^ up to order 2. 

Introducing the vectors Y = {Yi, . . . ,Yp)^ , ip{x) = {ipi{x), . . . ,(/3l(x))^ and the 
p X L matrix $ = {ip{xi), . . . , ip{xp))^ , the least squares estimator of ij.{x) under 



(4.5) reads 

J1ls{x) = (^(x)T(*T$)-l$Ty_ ^4 

We now apply the local linear weights W{x) = {Wi{x), . . . , Wp(x))^ to the 



residuals of the parametric fit (4.6). The smoothed residual random process r is 



rix) = ^Wj{x){Y, -fiLsixj)) =Wix)'' {I-P)Y (4.7) 

where I denotes the p x p identity matrix and P = $(^'''^)~-'"$''' denotes the 
p X p projection matrix onto the space spanned by the columns of 

Next, we determine the asymptotic mean and covariance functions of the 



(scaled) process r. Under (4.5), it is straightforward to see that E(r(x)) = 0. 
More generally let P be the orthogonal projection from (L2([0, 1]*^), (•, •)/) onto 
the linear subspace ^A and hy 9 = (6*1, . . . , 6*^)^ = (((/?i, /i), . . . , ^))^ the 
vector of coefficients of Pfi in Ai. Observe that ||IE(/z) — ;u||oo = ||/^"||oo 0(11^11^) 
by (A.l) and the bias properties of local linear estimators. Also, exploiting the 
former bias properties, (B.l), (B.2), and classical error bounds for numerical 
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approximations of integrals, it can be easily proved that 



[1 + y\\M\hf)] y^ixV [I + 0{p-')] d[l + 0{p-')] 

ip{xf0 + o{y\u\hf+p-^) 

Pf,{x)+0{\\hW\\oo+p-') 



uniformly in a; G [0, 1] . Combining these relations with (4.7) yields 



E{r{x)) = fi{x) - Pfi{x) + 0{y\U\hf +P~n- (4. 



With the above calculations one can infer from (4.7) that r has the same asymp- 
totic covariance as the process JI—JIls and then, using these calculations together 
with Theorem [T| the limit covariances and cross-covariances of Ji and JIls (scaled 
by ^/n) are derived without difficulty. (In particular the limit covariance of y/njl 
is R.) Finally the limit covariance function of ^/nr is 



L L 



T{x,x') = R{x,x') + ^^(pk{x)(fi{x') // R{u,v)(pk{u)'Pi{v)f{u)f{v)dudv 

k=l 1=1 

L p Li n 

( / R{x,u)ipi{u)f{u)duj ipi{x') - ^ ( / R{x',u)ipi{u)f{u)duj ipi{x) 



1=1 " ;=i 



(4.9) 



where the simple (resp. double) integrals are taken over [0, 1]'^ (resp. [0, 1]^*^). 

Denote by Ai'^ the orthogonal complement of A4 in (L2([0, l]'^, (•, •) j) and by 
C^([0, 1]*^) the space of functions having continuous partial derivatives on [0, 1]"' 
up to order 2. Using the proof techniques of Theorem [T| the following asymptotic 
normality holds true for ^/nr. 



Theorem 3. Assume (A.1)-(A.5) and (B.1)-(B.2) in model (2.1) with d^ {1,2}. 



Consider the smooth residual process r defined in (4.7) with a bandwidth h sat- 
isfying: — 5- and (p/log(p)) nfc=i hk ^ oo as n,p — )■ oo. 
Then under the null hypothesis ( |4.5| ), ^/nr converges weakly to Q{0, T) in C([0, 1]*^). 
Under the sequence of local alternatives fi = ip^ 9 + gj^fn, where 9 £ @ and 



g e C^([0, l]'^)f]M'^ are fixed, ^Jnr converges weakly to G{g,T) in C([0, 1]''). 
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We now apply Theorem [s] to testing (4.5) against fixed or local alternatives 
in a way that strictly parallels the SCB construction of Section 4.1. In particular 
it will be necessary to estimate the variance and correlation functions ar and pr 
of y/nr as well as a threshold Ca for a related Gaussian process. First note that 
for finite samples, the covariance function of ^/nr is 



Tnix, x') = W{x) ' (I - P) (S + V) (I - P) W{x') 



(4.10) 



where S is the p x p covariance matrix {R{xj,Xk)) and V is the common covari- 
ance matrix of the measurement errors in (A. 5). The matrix (5] + V) can be esti- 
mated by the empirical covariance of the data Yij, which is then plugged in (4.10) 
to produce an estimator T{x,x') of r(a;, x'). The related variance and correlation 
estimators are ar{x) = r{x,xy^'^ and pr{x,x') = r{x,x')/ {ar{x)ar{x')) . Now 
for a given significance level a, a threshold Cq such that P(||^(0, pr) ||oo > Ca) ~ a 
may be found exactly as in Section 4.1 by numerical simulation of ||^(0, ^r)||oo 
conditional on pr followed by the computation of the (1 — a) 100% quantile of 
the resulting distribution: P(||^(0, /3r)||oo > Cajpr) = «• 

From (4.8) and Theorem [sj we deduce the following result. 



Corollary 1. In model (2.1), consider the candidate model (4.5) for p and the 



test statistic T = y/n\\r /ar\\^ defined by (4.7)-(4.10). For a given a E (0, 1), let 
Ca be the conditional quantile defined above. 

Under the assumptions of Theorem^ the test obtained by rejecting (4.5) ifT > Cq, 
has asymptotic significance level a and is consistent against any fixed alternative 
Hi : p = g ^ C^([0, 1]^) Pi A^^. Given a constant B > and a real sequence 
Cji > such that n~^/^ = o(e„), the test is also consistent against the sequence 



of local alternatives Hn : p ^ {g €z C^HO, 1]' 



l/lloo < B, 



\9 - PgWoo = en}- 



Note that graphically, the test can be interpreted as plotting the SCB [r{x) ± 
Ca^^{x)/^/ri)'j for p{x) — Pp{x) and rejecting (4.5) if the horizontal line y = 
is not contained within the bands. 



5. Numerical study 

5.1 Normal and bootstrap SCB procedures 

In this section we assess the normal and bootstrap SCB procedures of Section 
4.1 in terms of coverage and amplitude through the numerical study of two exam- 
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pies of model (2.1 ). In short, the first example depicts a favorable situation with 
a smooth polynomial trend, Gaussian data, and no measurement errors while the 
second features very adverse conditions with a rapidly varying trend, a strongly 
non-normal random process Z, and additive white noise. More specifically, the 
first example, taken from Hart and Wehrly (1986), is 

Yij = n{xj) + Zi{xj), I < i < n,l < j < p, 
u(x) = lOx^ — 15x^ + 6x^, 

(5.1) 

= U - 0.5)/p, 

Zi'^g{0,R) with R{x,x') = (0.25)2exp(201og(0.9)|j;-x'|). 
Here, the xj are equidistant and the Zi are distributed as a centered Gaussian 
process with an Ornstein-Uhlenbeck covariance function chosen so that any two 
measurements spaced by 0.05 units have correlation 0.9. The noise level a = 0.25 
represents 25% of the range of the trend /x, which is considered as a moderate 
amount of noise in the data. The second model is specified by 

Yij = fj.{xj) + Zi{xj) + £ij, 1 <i <n,l < j <p, 
fj,{x) = sin(87rx) exp(— 3x), Xj = {j — 0.5) /p, 

Zi ~ Z with Z{x) = (\/2/6) (??i - 1) sin(7rx) + (2/3) (r/2 - {x - 0.5), 

^1 ~ Xi5 and r/2 ~ Exponential(l), 

Eij *~ A^(0,0.1^), Eij and Zi independent. 

(5.2) 

In this case, the regression function fi displays rapid variations over [0, 1] and 
has a sharp peak near the origin at x = 0.058. The process Z strongly deviates 
from normality, being based on chi-square and exponential r.v.. The standard 
deviation function d-{x) = {R{x,x) +0.1^)^/^ ranges between 0.295 and 0.348, 
which represents a fraction between 21% and 25% of the range of However, 
looking at the local variations of fi as measured by and the noise level a, 
it appears that \fJ-"\/d' ranges in [7, 1650] (compare with the range [0, 23] for the 
same function ratio in (5.1)). Such a range indicates than in regions where fi 
has high curvature, i.e. around peaks and troughs, serious estimation problems 
should arise due to the fact that the (squared) bias will be overwhelmingly larger 
than the variance, a violation of the conditions of Theorem [TJ In particular near 
x = 0.058, the problem will be prominent since the classical peak underestimation 
problem will combine with boundary effects. 
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The simulations were conducted in the R environment as follows. For each 



model (5.1) or (5.2), several values were selected for the sample size n, the design 
size p, and the bandwidth h of the local linear estimator /I. For each (n, p, h) the 
model was simulated Nyep = 50, 000 times to assess the normal bands and only 
N^ep = 5,000 times for the bootstrap due to its heavy computational cost. The 
bands were built at the confidence level 1 — 7 = 95% and their coverage levels (i.e. 
the proportion of simulations for which the bands contained /i) and amplitudes 



(in terms of the threshold c-y of (4.1 )) were recorded. In model ( |5.1[ ) the margins 
of error in the coverage levels can be evaluated as about y^7(1 — ^)/Nrep = 
0.0009, 0.0031 for the normal and bootstrap procedures, respectively. In model 
(5.2) the observed coverage levels are quite different from the target 95% and 
it seems more reasonable to evaluate the margins of errors by l/{2y/Nrep) = 
0.0022, 0.0070 respectively. At another level, it has been observed that for a given 
setup {n,p, h), the main source of variability in the bands' amplitude lies in the 
estimation of whereas the estimation of C7^(x) bears little influence. This is why 
the bands' average amplitudes are displayed in terms of c^, which besides allows 
for a direct comparison with the correct thresholds yielding nominal coverage. 

The SCB were implemented as follows. For the normal SCB, fi was estimated 
by a local linear fit with the Epanechnikov kernel K{x) = 0.75max(l — x^,0). 
For this task a R script based on sparse matrix representations was written by 
the author, allowing for fast and exact evaluations. The variance function cj^ 
of Z was estimated by the empirical variance function of the /Ij as in Section 
4.1. The correlation function p oi Z was estimated by a shrinkage estimator p 
based on the empirical correlation of the /2j (R package corpcor). After that, 
a number N of sample paths of the process ^(0,p) were simulated on an equis- 
paced grid of size 100 in [0, 1] and the threshold in (4.1 ) was computed as the 
95%-quantile of the associated sup norms (A^ was set to 8000, 10000, and 13000 
for p = 10, 20, 100, respectively, to ensure a good tradeoff between numerical ac- 
curacy and computational time). Concerning the bootstrap SCB, /i and a"^ were 
estimated as in the normal SCB procedure and the threshold was estimated 
as in Section 4.1 with 2500 bootstraps. 

It can be observed from Table [ST] that both the normal and bootstrap SCB 



methods work quite well in model (5.1), for a wide range of combinations of n,p 
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n 


P 


h 


Normal SCB 


Bootstrap SCB 


Correct 95% threshold 


Coverage 




Coverage 




R estimated 


R known 


10 


10 


0.2 


0.926 


3.20 


0.977 


4.14 


3.51 


2.66 


20 


20 


0.1 


0.957 


3.11 


0.964 


3.19 


3.07 


2.69 


20 


20 


0.15 


0.955 


3.08 


0.961 


3.10 


3.01 


2.61 


20 


20 


0.2 


0.951 


3.05 


0.948 


3.03 


3.08 


2.72 


50 


50 


0.05 


0.962 


3.01 


0.947 


2.90 


2.90 


2.73 


50 


100 


0.05 


0.961 


3.00 


0.953 


2.91 


2.89 


2.69 


50 


100 


0.1 


0.970 


2.95 


0.947 


2.82 


2.82 


2.60 


100 


10 


0.16 


0.931 


2.84 


0.903 


2.70 


2.98 


2.94 


100 


10 


0.2 


0.879 


2.81 


0.838 


2.67 


3.23 


3.03 


100 


20 


0.1 


0.960 


2.88 


0.966 


2.75 


2.79 


2.69 


100 


20 


0.15 


0.941 


2.83 


0.927 


2.69 


2.88 


2.72 


100 


20 


0.2 


0.892 


2.80 


0.874 


2.65 


3.16 


2.89 


100 


50 


0.05 


0.961 


2.93 


0.946 


2.82 


2.82 


2.73 


100 


100 


0.05 


0.961 


2.92 


0.952 


2.82 


2.82 


2.70 


100 


100 


0.1 


0.960 


2.87 


0.945 


2.73 


2.77 


2.62 


100 


100 


0.15 


0.945 


2.82 


0.946 


2.68 


2.86 


2.65 



Table 5.1: Observed coverage levels and thresholds for SCB of nominal level 95% in 
model (5.1). For each {n,p,h), the model was simulated 50,000 and 5,000 times for the 
normal and bootstrap procedures, respectively. The two columns c-y indicate the median 
threshold obtained in (4.1 1. The last two columns show the actual thresholds yielding 
95% coverage when the covariance function R is estimated or known. 



and h. They have similar performances (see Figure 1) and achieve a coverage near 
the target level 95%. This positive result can be explained by three favorable 



aspects of (5.1): (i) the low curvature of the polynomial function fi; (ii) the 
absence of measurement errors; and (iii) the normality of Z. The first point 
ensures that even for large n and small p, the squared bias of ju remains uniformly 
small before its variance over [0,1]. The second point allows the use of small 
bandwidths since no smoothing is needed to control the absent errors. (In this 
case the second condition in (A. 4) and the condition {p/ log(p)) nfc=i /ifc — ^ co in 
Theorem 1 can be dropped.) The second and third points imply, for the normal 
bands, that the normal approximation to the distribution of Jl is exact. 
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Normal SCB 


Bootstrap SCB 


Correct 95% threshold 


n 


P 


h 


Coverage 




Coverage 




R estimated 


R known 


10 


20 


0.08 


0.156 


3.29 


0.663 


6.10 


10.31 


5.41 


10 


50 


0.035 


0.769 


3.29 


0.959 


6.95 


6.16 


2.98 


10 


50 


0.05 


0.713 


3.28 


0.941 


6.50 


6.40 


3.44 


15 


20 


0.08 


0.055 


3.24 


0.341 


4.47 


10.23 


6.22 


20 


20 


0.08 


0.013 


3.20 


0.131 


3.87 


10.53 


6.92 


20 


50 


0.035 


0.839 


3.20 


0.928 


4.12 


4.48 


3.13 


20 


50 


0.05 


0.704 


3.19 


0.852 


4.02 


4.91 


3.90 


20 


100 


0.02 


0.878 


3.20 


0.934 


4.22 


4.28 


2.78 


20 


100 


0.05 


0.716 


3.18 


0.842 


3.93 


4.94 


3.84 


50 


50 


0.035 


0.814 


3.07 


0.837 


3.20 


3.93 


3.59 


100 


100 


0.02 


0.929 


2.97 


0.927 


2.97 


3.18 


2.96 


100 


100 


0.035 


0.738 


2.94 


0.715 


2.88 


3.81 


3.82 



Table 5.2: Observed coverage levels and thresholds for SCB of nominal level 95% in 
model (5.2). For each [n,p,h), the model was simulated 50,000 and 5,000 times for the 
normal and bootstrap procedures, respectively. The two columns c^ indicate the median 
threshold obtained in (4.1 1. The last two columns show the actual thresholds yielding 
nominal coverage when the covariance R is estimated or known. 



In model (5.2) the estimation conditions are very adverse, as seen earlier. It 



is thus no surprise to observe in Table 5.2 that the coverage levels fall short of 
the 95% target level both for normal and bootstrap SCB, although the bootstrap 
is more robust. On the other hand the last two columns of Table 2 show how 



intrinsically difficult the band estimation is in (5.2). For instance when p = 20 
and R is unknown, the threshold yielding correct coverage is close to 10 (compare 
to the 95% standard normal quantile 1.96 used in pointwise confidence bands), 
yielding SCB so large that they loose all practical interest. (See also the right 
panel in Figure 1.) Note that the extreme difficulty of the case p = 20 stems 
mostly from the sparsity of the data near the sharp peak of /x at x = 0.058. 
Regarding the influence of smoothing on the coverage level, it appears in Table 



5.2 that the smaller the bandwidth h, the higher the coverage. This observation is 



essentially related to the control of the bias and it has been confirmed with a wider 
range of values h not displayed here. For each p = 20, 50, 100, the values selected 
for h were first, the smallest h for which Jl is well-defined on the evaluation grid 
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and second, a nearby value indicating how quickly the coverage degrades when h 
increases. Interestingly enough, increasing the sample size n has different effects 
on the coverage according to p: for p = 20, as n increases the coverage decreases. 
This is due to the corresponding decrease in Var(ju(3;)) ~ cj^(x)/n, which makes 
the squared bias increasingly non-negligible before the variance. For p = 50, 
increasing n also increases the squared bias to variance ratio but the latter may 
remain negligible provided that n is not too big: the coverage increases from 
n = 10 to n = 20 and then decreases from n = 20 to n = 50. For p = 100 
the coverage, as a function of n, would start to decrease after a value n much 
larger than 100. (Note that the supremum of the squared bias to variance ratio 
is asymptotic to • ||/i"/cr^||oo-) Two other important effects of increasing 

n are first to reduce the stochastic error in the normal approximation to the 
distribution of ju, and second to improve the estimation of o"^. 




Figure 5.1: SCB for the regression function fi. Left panel: model (5.1) with n = p = 50 
and h — 0.035. The normal and bootstrap bands are identical and achieve the target 
coverage level 95%. Right panel: model (5.2) with n = 10, p = 50, and h — 0.05. The 



bootstrap bands are wider than the normal ones (c-y 
have nearly nominal coverage (94-1% vs 71.3%). 



6.50 vs c-y 



3.28 in (4.1); and 
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To conclude this section, we comment briefly on additional simulations not 



displayed here. When simulating model (5.1) with difl'erent correlation levels, 
replacing the parameter v = 0.9 in the covariance i? by z/ = 0.7 or 0.5, the cov- 
erage levels are close to nominal for the two SCB procedures and as expected, 
the threshold increases as the amount of correlation v decreases. Also, when 



crossing the functions fi and R of (5.1) and (5.2) in new simulations, we are 



confirmed in the idea that the coverage level depends mostly on the (negligibility 
of) the squared bias to variance ratio and on the (non-)normality of the estimator. 

5.2 Goodness-of-fit: comparison between the SCB test and a Pseudo- 
Likelihood Ratio Test 

This section assesses numerically the statistical significance and the power of 
the goodness-of-fit test of Section 4.2, referred to as the SCB test henceforth. We 
use the Pseudo-Likelihood Ratio Test (PLRT) of Azzalini and Bowman (1993) 
as a benchmark for comparison because of its generality and simplicity of im- 



plementation in model (2.1). Hereafter we proceed to describe the simulation 
model, the implementation of the tests, and the experimental results. 
The model under study is 

Yij = fi{xj) + Zi{xj), 1 <i <n,l < j <p, 
Xj = [j - 0.5)/p, 
Ho : fj.{x) = X, 

Hn ■ n{x) = X + n^^/^ \og{n)g{x) 

with 5GC2([0, 1]), g{x) = for x G [0, 0.4] U [0.6, 1], 
g{x) G VTs for x G (0.4,0.45], g{x) G vrs for x G (0.55,0.6], 
g{x) = 0.2 exp(-(x - 0.5)^) for x G (0.45, 0.55], 
Zi ~ Z = g{0,R) with R{x,x') = (0.25)^ exp(20 log(0.9)|x - x'|), 

where vTfc denotes the space of polynomials of degree at most k. (The covariance 



(5.3) 



structure of the data is the same as in model (5.1 ).) The candidate model for n is 
= vTi, the space of linear functions. The local alternatives Hn are obtained by 
adding a scaled bump function g to ^o{x) = x, which produces a local nonlinearity 
on [0.4,0.6]. On account of Section 4.2 we can expect the SCB test to detect the 
nonlinearity in /u while the PLRT might very well miss it. 
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The simulations were realized similarly to Section 5.1. For different values of 



{n,p, h), model (5.3) was simulated 50,000 times under Hq and Hn, the goodness- 
of-fit tests were implemented and their type I&II error rates measured. The SCB 
test required to estimate the covariance function of the test statistic ^/nr defined 
by (4.7) and the threshold Ca of Corollary [T] The covariance was estimated 
by shrinking the empirical covariance of the data with the R package corpcor 
and plugging the shrinkage matrix into (4.10) in place of S. (Observe that the 
error covariance matrix V is zero in the absence of measurement errors in (5.3).) 
The threshold was estimated as in Section 4.2, using an equispaced grid of 
size 100 to simulate realizations of the Gaussian process G{0,pr) conditional on 
the correlation estimator pr- The PLRT of Azzalini and Bowman (1993) was 
implemented with standard R packages. We give here a short description of this 



procedure in the context of (5.3). The test starts by fitting a regression line JIls 
and a local linear estimator /I to the averaged data {xj,Yj). It then computes 
the test statistic F = RSSq/ RSSi — 1, where RSSq and RSSi are the residual 
sums of squares of fiLs and /i, respectively. Denoting by Fobs the observed value 
of F and putting Zp = {Z{xi), . . . , Z(xp))^, the p-value P(F > Fobs\Ho) can be 
written as P{Zp^ AZp > 0) for some pxp symmetric matrix A depending on Fobs 
and the smoothing matrices of flis and Ji. The distribution of Zp KZp is well 
approximated by an ax^+c distribution, where a, 6, c depend on A and S and are 
obtained by matching the first three cumulants of the two distributions. The p- 
value then obtains as 1 — P(x^ < —c/a). Returning to our simulations, the PLRT 
required to estimate the unknown covariance I] = {a"^ p^^~^^), with p = (0.9)^"/'^ 
and a = 0.25. The estimation was done either with the empirical covariance 
of the data, either by correctly assuming an AR(1) model for the discretized 
process (Z(xi), . . . , Z{xp)) and estimating and p through standard repeated 
measurements techniques (e.g. Hart and Wehrly, 1986). To assess the influence 
of covariance estimation, the PLRT was also implemented with S known. 

Table [K3] displays the type I error rates over the simulations. For the PLRT 
procedure, when S is known, this rate is very near the significance level a = 5% 
as expected. When S is estimated parametrically the rate is slightly excessive, 
around 6%. In the case of a nonparametric estimation of S, the PLRT is clearly 
not accurate for a small sample size n = p = 10, slightly off for n = p = 20, and 
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SCB 




PLRT 




n 


P 


h 


Gov. nonpar. 


Gov. nonpar. 


Gov. par. 


Gov. known 


10 


10 


0.17 


0.038 


0.093 


0.065 


0.051 


10 


10 


0.25 


0.047 


0.089 


0.060 


0.052 


20 


20 


0.08 


0.047 


0.063 


0.062 


0.050 


20 


20 


0.15 


0.074 


0.061 


0.063 


0.051 


50 


50 


0.035 


0.053 


0.049 


0.062 


0.050 


50 


50 


0.05 


0.061 


0.050 


0.063 


0.050 


100 


100 


0.02 


0.053 


0.047 


0.064 


0.050 


100 


100 


0.05 


0.063 


0.046 


0.065 


0.052 



Table 5.3: Type I error rates in testing for linearity in model (5.3 I at the significance level 
a = 5%. For each {n,p,h) and each test, 50,000 simulations were run. In the PLRT 
procedure, the covariance structure of the data was either estimated nonparametrically, 
parametrically, or known. 



it works fine for n = p = 50 and n = p = 100. In comparison, tlie SCB test is 
overall less accurate regarding the target level a = 5% except for small sample 
sizes where it seems more robust. It should be noted however that for each {n,p), 
there is at least one value h yielding nearly nominal coverage. 



Looking at Table 5.4 it appears that the SCB test has a much larger statis- 
tical power than the PLRT. Across the simulations, the average power is 80% for 
the SCB test versus about 35% for the PLRT. Besides the power does not go be- 
low 45% for the SCB test while it can be as low as l%-5% for the PLRT for small 
samples. In other simulations not displayed here, the superiority of the SCB test 
gets even larger if the bump function n~^^'^log{n)g(x) in (5.3) is replaced by 
a smaller bump n^^^^ \oglog{n)g{x). Heuristically, the low power of the PLRT 
can be attributed to the fact that since Hn is local in nature and F is based 
on a euclidean norm, the local discrepancy between /I and fiLS the bump is 
masked by their global agreement on [0, 0.4] U [0.6, 1]. Put differently, there is no 
clear-cut difference between the distribution of F under Hq and under Hn until 



|a*o — /^n||L2 becomes "large" enough, for large n,p and small h. See Figure 5.2 



(the densities have been obtained by numerical simulation). Note that analytic 
power calculations can be obtained for the PLRT via saddlepoint approximations 
to noncentral F distributions (see e.g. Butler and Paolella, 2002). 
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H 


P 




V_ ' U V . iHJliL^dii . 


\_J\J V . llUilUcLl . 


KjKJ V . Udi. . 




10 


10 


0.17 


0.512 


0.052 


0.025 


0.015 


10 


10 


0.25 


0.459 


0.031 


0.012 


0.008 


20 


20 


0.08 


0.807 


0.230 


0.238 


0.190 


20 


20 


0.15 


0.664 


0.061 


0.059 


0.040 


50 


50 


0.035 


0.993 


0.454 


0.548 


0.438 


50 


50 


0.05 


0.995 


0.232 


0.307 


0.232 


100 


100 


0.02 


1.000 


0.990 


0.996 


0.994 


100 


100 


0.05 


1.000 


0.390 


0.516 


0.423 



Table 5.4: Statistical power of the the SCB and PLRT procedures in model (5.3 1. The 
nominal significance level was a — 5% and 50,000 simulations were executed for each 
{n,p,h) and each procedure. For the PLRT procedure, the covariance structure of the 
data is either estimated nonparametrically, parametrically, or known. 





Figure 5.2: Density curves for the PLRT statistic F (left panel) and the sup norm- 
based statistic T (right panel) under Hq and iJ„ in model ( |5.3| with n ^ p — and 
h = 0.05. The vertical lines indicate the critical points for the tests at the level a ~ 5%. 
The associated statistical power is 23.6% for the PLRT and 98.9% for the SCB test 



(approximated by 23.2% and 99.5% in the simulations of Table 5.4) 
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6. Illustration with a speech data set 

In this section we look into a speech data set studied by Hastie et al. (2009) 
and available on the web at http : //www-stat . Stanford. edu~tibs/EleinStatLearn/. 
The data consist in 4509 log-periodograms obtained at 256 equidistant frequen- 
cies in the range 0-8kHz. Each discretized curve corresponds to one of five 
phonemes coded as 'aa' as the vowel in 'dark' (695 curves); 'ao' as the first 
vowel in 'water' (1022 curves); 'del' as in 'dark' (757 curves); 'iy' as the vowel in 
'she' (1163 curves); and 'sh' as in 'she' (872 curves). For simplicity of notation 
we rewrite the observation points in the frequency domain as 1, . . . , 256. 

To illustrate the possible uses of SCB techniques, we present three inference 
procedures relevant to the statistical analysis of our data set. 

6.1 Band estimation of regression curves. 

We apply the SCB procedure of Section 4.1 to the mean regression curves for 
each phoneme. Prior to inferring the mean regression curve, it is worth examining 
how this mean curve relates to the individual ones. Indeed it may very well be 
that the mean curve does not resembles any single curve at all. In our case, the 
roughness in the log-periodograms is smoothed out by averaging over the large 
sample available. However some salient features in the individual curves such as 
peaks and valleys are recovered after averaging, due to the remarkable fact that 
these features are present in almost all the curves at approximately the same 



locations (see Figure 6.3 and Section 5.3). We observe that smoothing in the 
frequency domain seems necessary to make the curves more readily analysable 
and interpretable. The general allure of the individual smoothed curves (e.g. fre- 
quency subdomains where the log-intensity is approximately monotone or linear) 
is also conserved through the averaging. 



The average log-periodograms are displayed in Figure 6.3 For each phoneme, 
the empirical standard deviation curve varies between 1 and 3 log-intensity units, 
which represents a fraction of the range of the average log-periodogram varying 
between 10% and 21% for the phonemes 'aa' and 'ao' (indicating a low variability 
in the data), 15% and 30% for 'del' and 'iy' (low to moderate variability), and 
between 28% and 35% for 'sh' (moderate to large variability). For brevity we 
only show the SCB for the phoneme 'sh' based on the 872 available curves. The 
SCB is built at the levels 95% and 99%, using a local linear estimator with a 



22 



DAVID A. DEGRAS 







aa 


o 










del 






- - iy 








— sh 










o 












in - 







1 I I I I 

50 100 150 200 250 



frequency (rescaled) 

Figure 6.3: Average log-periodograms. For each phoneme, the roughness in the individual 
curves is smoothed out by averaging but the peaks and valleys are conserved. 
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Figure 6.4: SCB of levels 95% and 99% for the regression curve of the phoneme 'sh'. 
Due to the large sample size (n = 872^, the bands have a small amplitude allowing to 
confirm the remarkable features in the regression curve ( existence and location of local 
extrema, monotonicity patterns, etc.). Note that the bands are almost identical at the 
two confidence levels. 
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(truncated) Gaussian kernel (R package locpoly) and the bandwidth h = 0.94 
that minimizes the leave-one-curve-out cross validation score (see e.g. Hart and 
Wehrly, 1993). 

6.2 Comparison of regression curves. 

After building SCB for a single regression curve, we turn to another im- 
portant inference task which is the comparison of two mean curves. Figure 



6.3 indicates quite a number of similarities between the regression curves for 



the phonemes 'aa' and 'ao'. A formal inference tool that could confirm or in- 
firm the hypothesis of equality between the two curves would indeed be de- 
sirable. Such a procedure can be achieved simply by following the method of 
Section 4.1: (i) for each phoneme 'aa' and 'ao', build the corresponding esti- 
mator Jl and its estimated covariance R/n; (ii) estimate the difference in the 
regressions {fiaa — fJ-ao) by {fiaa — J^ao) whose estimated covariance is Raa-ao = 
{Raa/naa + Rao/nao) (obscrve that 

^aa and ^ao s^e independent); (iii) denoting 
by ^aa-ao and Paa-ao the Standard deviation and correlation functions associ- 
ated to Raa-ao, obtain numerically the distribution of ||^(0, paa-ao)||oo and for 
a given significance level a, use the relevant quantile Cq, of this distribution to 
build the SCB { \{jlaa — Jj-ao){x) ± Cadaa-aoi^)] : < X < 256} of level 1 — a for 
il^aa — IJ-ao)', (v) reject Hq : ^aa = Mao if the horizontal line is not within the bands 
or equivalently, if /laa is not within the bands centered on /lao- By implementing 
this procedure with any reasonable bandwidth, Hq is rejected at any significance 
level (p- value < 10^^^). 

6.3 Prediction of individual curves. 

The ability to predict new curves and to assess their range of variation proves 
useful in various situations, e.g. in voice recognition where the goal is to identify 
the speaker. In the present data set where only a few curves are available for 
each subject, we study prediction by randomly splitting the available curves for 
each phoneme into a training set and a test set of equal sizes. Prediction bands 
are built from the training set as in Section 4.1 (omitting of course the factor ^/n 



in (4.1) since the goal here is prediction and not regression estimation) and their 
coverage levels, i.e. the proportions of curves in the test set contained within 
the bands, are observed in function of the amount of smoothing applied to the 
data. Ten fixed bandwidths h = 1, . . . , 10, are considered as well as a data- 
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Figure 6.5: Test for equality of the regression curves fiaa md fXao- SCB of level 99% 
are plotted around the estimate 'jlaa- Since flao is not within the hands, the hypothesis 
Hq : iJ-aa — l^ao Can be rejected at the significance level a — 1% (in fact, at any a). 



driven bandwidth obtained by splitting the training set in half and selecting the 
bandwidth /i G {1, . . . , 10} that gives the closest coverage to the target level 95% 
or 99% for the other half of the training set. For each phoneme the random split 



is repeated 50 times. The mean coverage levels are reported in Table 6.5 for a 
subset of values of h. 



Table 6.5 indicates that the coverage levels are very close to nominal as 
soon as the bandwidth is large enough (and in particular for the data-driven 
bandwidth h) except for the phoneme 'del'. These results can be explained by 
the facts that (i) a minimal amount of smoothing is needed to attenuate the 
erratic, spikey behavior in the raw data curves and make them more predictable; 
(ii) the distributions of the data curves appear approximately Gaussian for all 
phonemes except for 'del' which displays strong non-normality. (Our diagnostics 
for normality were established by performing a functional principal components 
analysis (PCA) for each set of curves, inspecting visually the plots of the scores 
along the first few components, and running Shapiro- Wilks tests on the scores.) 
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h 


1 


2 


5 


8 


10 


h 


aa 


0.927 


0.938 


0.944 


0.943 


0.941 


0.945 


ao 


0.924 


0.940 


0.942 


0.947 


0.947 


0.948 


del 


0.888 


0.882 


0.884 


0.886 


0.885 


0.890 


iy 


0.919 


0.936 


0.945 


0.943 


0.944 


0.946 


sh 


0.922 


0.941 


0.949 


0.951 


0.951 


0.952 


aa 


0.985 


0.984 


0.984 


0.983 


0.984 


0.987 


ao 


0.977 


0.984 


0.988 


0.988 


0.988 


0.989 


del 


0.945 


0.933 


0.931 


0.932 


0.932 


0.945 


iy 


0.971 


0.984 


0.991 


0.993 


0.992 


0.991 


sh 


0.980 


0.992 


0.992 


0.991 


0.992 


0.992 



Table 6.5; Coverage levels for the prediction of new curves with SCB of levels 95% (5 top 
rows) and 99% (5 bottom rows). For each phoneme, the hands were based on a random sample 
comprising half of the available curves and used to predict the remaining half of the curves. The 
random sampling was replicated 50 times. The reported numbers are the mean coverage levels 
over the replications in function of the bandwidth used. The column h denotes a data-driven 
bandwidth selection procedure. 



7. Discussion 

We have established in this paper a functional asymptotic normality result 
for nonparametric regression with functional data. The result allows to build 
SCB that prove useful in various statistical tasks such as estimating the re- 
gression function, testing the goodness of fit of parametric models, testing the 
equality of mean curves, and predicting individual curves. The SCB procedures 
are fully nonparametric (regression and covariance estimation) and the required 
bandwidth selection can be data-driven. 

It has been seen that the SCB estimation of the regression /i yields accurate 
coverage whenever ^ is reasonably smooth and sufficient data are available. It 
produces significantly better results than an initial attempt of the author to ex- 
tend the SCI of Degras (2008) to full bands via the interpolation arguments of 
Hall and Titterington (1988). (This approach required the difficult estimation 
of derivatives of /x, causing visually unattractive confidence bands and low cov- 
erage.) The present SCB estimation of /i, which relies on a numerical method 



to compute the threshold in (4.1), also improves upon previous attempts to 
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approximate via theoretical formulae such as Borell's inequality (see (13) in 
Section A. 2) which is too conservative, or the limit result of Landau and Shepp 
(1970) which only depends on the confidence level 1 — 7 and not on the limit 
correlation function p of /I (see Degras, 2009). (Indeed a sensible estimate of 
should depend on p since the stronger the correlation structure of a (centered, 
Gaussian) process, the less likely it is to jump above a given threshold c > 0.) 

On the basis of our numerical study, the SCB goodness-of-fit test clearly 
outperforms the PLRT of Azzalini and Bowman (1993) in detecting local de- 
partures of /i from a linear model while retaining a close-to-nominal significance 
level. This superiority, due to the use of a supremum norm in the test, can be 
expected to maintain before other tests based on residual sums of squares or L2 
distances. On the other hand, the latter kind of test will probably do a better job 
at detecting small but global departures from a parametric model. We remark 
that beyond curvilinear models, the SCB test for goodness-of-fit can be extended 
e.g. to nonlinear parametric models fitted by maximum likelihood. 

The application of the SCB method to functional prediction (Section 5.3) 
relies on the approximate normality of the data. If normality does not hold, 
one may resort to the bootstrap method proposed in Section 4.1. Another use 
of SCB with potential interest resides in the estimation of local extrema of the 
regression function /x: because the functional asymptotic normality result of this 
paper also holds for the estimation of jj! (a formal proof is obviously beyond our 
scope here), it is possible to build SCB for p! and derive confidence intervals for 
the location and size of local extrema based on the zero crossings of the bands. 
See Song et al. (2006) for a related work on microarray data. 

We say a word about data-driven bandwidth selection and bias reduction. 
Firstly, the popular leave-one-curve-out cross-validation technique appears well 
suited to our setup because of its practical efficiency and its optimality proper- 
ties with functional data (Hart and Wehrly, 1993). Since by construction the 
bandwidth hcv in this method is of order it suffices to slightly strengthen 

the condition (A. 4) into 'n}/^ log(p) = o{p) for our results to hold with hcv when 
d = \. Secondly, our results extend easily to jacknife-type estimators of the form 
2/i/i — /I^^ and to local quadratic estimators, which allows to reduce the bias 
from order h?' (local linear) to , assuming 3 bounded derivatives for ^ in (A.l). 
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Finally, we mention a possible extension to this work which will be of partic- 
ular interest for handling functional time series: does the functional asymptotic 
normality of the estimator still hold in the case of dependent data curves? If so, 
what is the normalizing rate? 
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A. Proof of Theorem 1 

The proof of Theorem [T] consists in: first, checking that the squared bias 



of the local linear estimator defined in (2.3)-(2.5) is uniformly negligible 
before its variance over [0, 1]°' as n — )• oo; second, establishing the conditions of 
the functional CLT 10.6 of Pollard (1990), which mostly amounts to proving the 
manageability of the smooth curves = Yl^=i^ji^)^ii^j)'' third, showing 

that the smoothed error process X]j=i Wj{x)£j goes uniformly to zero in prob- 
ability at a rate faster than n~^/^ for x G [0, 1]"^. The second and third points 
will be addressed in Sections A.l and A. 2, respectively. Putting these results 
together directly yields the theorem, given the decomposition 



Mx) - Kx) = (E(/2(x)) - /z(x)) + Wj{x)Z{xj) + W,{x)e, . (1) 



The proof of the theorem being essentially the same in dimensions d = 1,2, 
we only address the univariate case and will briefiy indicate how the argu- 
ments extend to the bivariate case. Throughout this section the letter C de- 
notes a generic positive constant not depending on nor h. The notation 
Jx = {j '■ \xj — x\ < h} is used for the set of indexes j for which Wj{x) ^ (recall 
that K has support [—1, 1]). The cardinality \ Jx\ is of order ph due to (A. 3). 



We address here the issue of bias control. With classical bias results for local 
linear estimators (e.g. Fan (1992)) and Theorem 1 of Degras (2008), it is easy to 
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see that under (A.1)-(A.3), 

sup,,[o,i]'' Mm) - = \\f^"\\io o{\\hr) 

sup^,g[o,i]d I Var(/i(x)) — n^^R{x,x)\ = o{n^^) 

as n,p — )• oo, /i — )• and ^11^=1 oo. This entails the condition n||/i||^ — )• 

in order to make the first neghgible before the second. 



A.l Manageability 

Let us write 4>in{x) = n^-*^/^ X]i=i ^j{x)^iixj) for i = 1, . . . ,n and X„ = 
Y17=i ^in- Our aim here is to show the asymptotic normahty of Xn in C([0, 1]). 
To do this, we need to estabhsh the conditions (i)-(v) of the functional CLT 10.6 
of Pollard (1990). We start by defining the objects relevant to this theorem. Let 
^ni = n~^^'^{\Zi(0)\ + Mi) for i = 1, . . . ,n, where the Mi are the r.v. appearing 
in assumption (A. 2), and consider the envelop = i^ni, • • • , ^nn) for the (pin- 
Also define Pn{x,x') = [Yll'=i^{(f'in{x) - </'i„(x'))^] 

Using the fact that the Zi are independent and distributed as Z and conver- 
gence properties of local linear fits, it appears easily that 



olix,x')=E(J2iWjix)-W,ix'))Z{ 



= Y1 [W,{x)Wk{x) - 2Wj{x)Wk{x') + Wj{x')Wk{x'))R{xj,Xk) 

= R{x, x) - 2R{x, x') + R{x', x') + o(l) (2) 

as n — )• oo, /i — 7- and ph ^ oo. Observe that with the same arguments as 
above, K{Xn{x)Xn{x')) — )• R{x, x') as n — )• oo, /i — )• and ph — )• oo, which is con- 
dition (ii) of the aforementioned theorem. Conditions (iii) and (iv) hold because 

T^umid = n\zm + m)^ < oo by (a.2) and Y:un^l^mn^ > 4) = 

E((|Z(0)| +M)2/{(|Z(0)| +M) > eV^}) ^ as n ^ oo for ah e > 0. Condition 
(v) is guaranteed by the uniform convergence in ^ which comes from the conti- 
nuity of R over [0, 1]^ (taking expectations in (A.2)) and from the uniformity of 
the local linear approximation to continuous functions over compact domains. 
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It remains to show the more difficult condition (i) , namely the manageability 
property of the (pni,i = 1, with respect to the envelop Given an 

arbitrary real e > 0, this amounts to evaluating the smallest number N{e) such 
that there exist ri, . . . , Tjv(e) G [0, 1] verifying 

Va^G [0,1], 3ke{l,...,N{e)} : Vi e {1, . . . ,n}, \(pni{x) - (l)ni{Tk)\ < e^ni- 

Note that the packing numbers, euclidean norm and rescaling terminology of 
Definition 7.9 in Pollard (1990) have been rephrased in terms of covering numbers 
and loo norm after observing that T,i=i of {(f)niix) - (/'m(x'))^ < Y17=i(^i^li 
for all rescaling (ai, . . . ,a„) G is equivalent to \4>ni{x) — (pni{x')\ < e$„j for 
i = 1, . . . , n. Let us fix e > and distinguish two cases according to h = h{n). 

• < e. First write 



i{x) - Zi{x)\ 



< CMikf^ < CMie (3) 



for all X G [0,1] and all h > l/(2pmax[gj](/)) (the latter condition ensures 
well-definiteness of local linear smoothing under the design (A. 3)), by using the 
compacity of the support of K, the Holder-continuity assumption (A. 2) for Zi, 
and the trivial fact that Yl^=i is uniformly bounded in x and n. Again 

with (A. 2), observe that \Zi(x) — Zi{x')\ < CMie as soon as \x — x'\ < e^^^. 
Conclude that for all {x,x') such that |x — x'| < e^^^, we have 

\(l)ni{x)-(t>ni{x')\<3Ce^ni, (4) 

which yields a covering number N^e) of the order of e~^/^ . 

• > e. In this case one easily sees that 

p 

\cl)ni{x) - </>ni(xO| < Yl - ^ni. (5) 

i=i 

We study the previous increment with the following result. 
Lemma 1. As n,p oo, h and ph — >■ oo, 
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and 

\W,ix) - W,ix')\ =0^^ (^^^ A l)) (6b) 

uniformly in j = 1, . . . ,p and in x, x' G [0, 1], where a Ab = min(a, b). 
Proof. 

Recall that Wj{x) = with the wj defined in ( |2.4[ ). In view of as- 

sumption (A. 3), it can easily be shown that the functions s;(x) defined in (2.4) 
satisfy 

for / = 0, 1, 2, uniformly in x E [0, 1] as n — )• oo and /i — )• 0. 
Further, it holds that 

1 (x-^uY ^ (^TT^) /(^)^^ = f^'^^^'' u'K{u)du + O (/I'+i) 

and as a consequence, we have 

/{l~x)/h 
u^K{u)du + o (h^^ (7) 



and 



^Wj(x) = S2{x)sq{x) - s\{x) = o[h?') +h?f'^{x) 

^{i-x)/h r{i-x)/h / Al-x)/h ^2 

u K{u)du I K{u)du — I / uK{u)du 

-x/h J —x/h \J—x/h 



(8) 



uniformly in x G [0, 1] as n — )• oo, /i — t- and ph — )■ oo. 

Now, the integral factor in Q is positive by virtue of the Cauchy-Schwarz 
inequality in L2{[0, 1]^). (For x far enough from the boundaries this factor reduces 
to f v?K{u)du by the moment properties of K and the compacity of its support 
K \s a. symmetric density function supported by [—1,1].) Moreover, being a 
continuous function of a; G [0, 1] it remains bounded away from zero and infinity 
so that X]j=i^j(^) is uniformly of order h?. Invoking Q and the compact 
support of K, one sees that Wj{x) = { ^^^ ^ {s2{x) — (x — Xj)si{x)) = 
O (:^K { ^!LJEl ] h^]. The two former facts on the numerator and denominator 



of Wj{x) produce (6a 
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It remains to compare Wj{x) and Wj{x') for arbitrary j,x,x'. First observe 
that if either |x — Xj| > /i or |x' — Xj| > h, then at least one of these weights 



is zero in which case (6b) reduces to (6a). We may thus assume that max(|x 



' — Xj\) < h. In view of the decomposition 

w,{x) - w,{x') = "ilf - w,{x') 

2^7 = 1 UJj\Xj 



„ Y.% iiwj{x) - Wjjx')) 
T.%iWj{x) 



(9) 



(6a), the fact that Wj{x) is of order /i^ and (2.4), the comparison of the weights 
Wj{x) and Wj{x') boils down to comparing si{x) and si{x') for / = 0, 1,2. 
Basic linear algebra shows that 



\si{x) - si{x')\ 



O 



\x — X 



A lU' 



(10) 



uniformly in x and x' . We now get from ^ and ^ that 



ph 



Wj{x) — Wj{x') 



{s2{x) — {x — Xj)si{x)) K 



ph 



{s2{x') — {x — Xj)si{x')) K 



X X j 



ph 



< 



K 



ph 



ph 



\s2{x) - {X - Xj)si{x)\ 



+ K 



ph^ ^ ||(s2(a:;) - S2{x)\ + \si{x){x -x)\ + \{x- Xj){si{x') - si(x))|| 



(11) 



Finally, putting together the fact that \Jx\ and \Jx'\ are of order ph (i.e. the 
non-null weights entering the sum Yl^=ii'^jix) — Wj{x')) in ^ are in a number 
of order ph), ([6a|), ([s]), and (10), one may conclude to (|6b|) without difficulty. □ 



Now, using Lemma ([T]) and the fact that \Jx\ and \Jx'\ are of order ph, one 
obtains from ^ the following bound: 



(j^niix) - 4>ni{x') 



< C I A 1 1 



(12) 



Since h^ > e hy assumption, it follows that for any x,x' such that \x — x'\ < 
gi+i//3/(7^ the distance between (f>ni{x) and (f>ni{x') {i = 1, . . . ,n) is smaller than 
e^ni- Therefore the covering number A^(e) is at most of the order of e~^~^^^ . 
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Finally, gathering the cases < e and /i^ > e, we see that A^(e) is at most 
of order max(e~^/'^, e^^^^/^) = g^i^i//^ for e < 1, which guarantees the fact that 
(log A^(e))"'^^^de < oo, i.e. the manageability of the (f>ni with respect to the 
envelop All the conditions of the functional CLT 10.6 of Pollard (1990) are 
thus met. Applying it, we get that Xn = \/"'X]j=i WjZ{xj) converges weakly in 
C([0, 1]) to a Gaussian process with mean zero and covariance R, as claimed. 



Remark 1. In the bivariate case (d = 2), the previous arguments carry over with 
a few simple modifications. In particular in Lemma ([T]), ( [6a[ ) transforms into 



\Wj{x)\ = 0{{phih2)-^K[h~^{x - Xj))) and ([6b]) becomes \Wj{x) -Wj{x')\ = 
O ((j?/ii/i2)~"'^(||/i~^(2; — x') II A 1)) . The manageability property is obtained ex- 
actly as when d = 1, by studying four cases according to the signs of h^ — e and 
/12 — e. The other conditions of the functional CLT come alike. 



A. 2 Control of the smoothed error process 

Let us denote by W{x) the vector of weight functions {Wi{x), . . . , Wp{x))'^ of 
the local linear estimator at x and by e(x) the smoothed error process Wj{x)£j. 
We will show that v^ll^loo converges to zero in probability as n — )• 00 by applying 
the well-known Borell's inequality 



P ^supX(t) > < 2exp (^A -E(^supX(t))^ ^ 



(13) 



holding for all centered, continuous Gaussian process X indexed by a set T and 
for aU A > E{snpt^T X (t)) , where = suptg^^ E(X2(t)) (e.g. Adler (1990) p. 
43). In the present context X = y/ne and T = [0, 1]. 

Before to apply (13), we must bound the quantities E(sup^g[o,i] V^^{x)) and 



nsup^.g[o,i] Var(e(j;)). For the first quantity, we use the classical entropy bound 
EfsupX(t)) < C / VlogiV(e) de (14) 

^ t&T ' Jo 



(see e.g. Adler (1990) p. 106) where C > is a universal constant and A'^(e) is 
the smallest number of balls needed to cover T in the pseudo-metric d{s, t) = 
(E(A'(s) — A'(t))^)^/^. Here, with assumption (A. 5) on the common covariance 
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matrix V of the random vectors {en, . . . , e^p)^, i = 1, . . . , n, we have 



Ei^/TiY^{Wj{x) - Wj{x'))e. 

= {W{x) -W{x')yv{W{x) -W{x')) 

< \\V\\x\\W{x)-W{x')f 

< C\\W{x) -W{x')f 

where ||V|| denotes the largest eigenvalue of V. It follows from Lemma [T] that 

C 



d{x, x') < 



X — X 



h 



A 1 



and thus, for all n > 1 and e > 0, it holds that 

N{e) = 1 

Plugging (16) in (|14|) we obtain 



— vp/i 



ife<^. 



(15) 



(16) 



E 



sup y/ne{x) 

a;6[0,l] 



< c 



C 

\/ph 



c 



^-log(pl/2/j3/2g) 

exp(— u^) du 



hVph J^\og{c/h) 



< 



C 



h\/ph 



(17) 



after using the change of variable u = \J — log(p^/^/i'^/^e), an integration by parts, 
and the classical tail probability bound 4'{t)dt < x^^(f){x) (with x > and 
(j){t) = (27r)-^/2gxp(-t2/2)). Hence it sufHces that /i and log{h)/ph 
as n — )• oo to ensure that E(sup^.g[o,i] \/ne{x)) — )• 0. After simple algebraic 
manipulations of the condition \og{h)/ph — t- together with the rates n = o{p'^) 
in (A. 4) and nh'^ — t- in Theorem 1, it turns out that this condition is equivalent 
to /i — 7- 0, n^/^^*^) log(|?) = o{p) in (A. 4), and {p/\og{p))h — )■ oo in Theorem 1. 
We thus use the latter conditions which are more explicit than the former. 

Turning to the variance of e, we utilize again Lemma [T] and (A. 5) to get for 
all X G [0, 1] 



n 



n 



nph 



(18) 
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Borell's inequality (13) may now be applied to X = -y/ne with A set to an 
arbitrary e > 0. Under the conditions /i — )■ and ph\ log(/i)| — >• oo as n — )■ oo, we 



deduce from (17) and (18) that 



P( ^/n sup e{x) > e ] = O (exp [-Cphe^)) , 



(19) 



which yields the uniform convergence in probability of the smoothed error process 
^/ne to zero as requested. 

Remark 2. The former arguments extend to the bivariate case simply by replac- 



ing h with hih2 and \{x — x') / h\ with\\{x — x)' /h\\ m (15)-(19). In particular (|T5|) 
extends to the case d = 2 thanks to Remark^ and a simple partitioning of [0, 1]^, 
while the other equations come in a straightforward way. The conclusion then 
holds under the conditions ||/i|| — )• 0, p/ii/i2 — ^ oo and (p/ii/i2)~"'^ log(/ii/i2) — )• 0, 
or equivalently \\h\\ — )• and p/ \og{p){hih2) 



oo. 
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